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This paper considers a class of nonparametric autoregressive mod- 
els with nonstationarity. We propose a nonparametric kernel test for 
the conditional mean and then establish an asymptotic distribution of 
the proposed test. Both the setting and the results differ from earlier 
work on nonparametric autoregression with stationarity. In addition, 
we develop a new bootstrap simulation scheme for the selection of 
a suitable bandwidth parameter involved in the kernel test as well 
as the choice of a simulated critical value. The finite-sample perfor- 
mance of the proposed test is assessed using one simulated example 
and one real data example. 

1. Introduction. Time series regression analysis has a long history. There 
have been many studies in using parametric linear autoregressive moving 
average models [Brockwell and Davis (1990)], parametric nonlinear time 
series models [see, e.g., Tong (1990), Granger and Terasvirta (1993)], and 
nonparametric and semiparametric time series models [Tong (1990), Fan and 
Yao (2003) and Gao (2007)]. In many existing studies, particularly in the 
nonparametric situation, the focus of attention has been on the case where 
the observed time series satisfies a type of stationarity. Such a stationarity 
assumption is quite restrictive in many cases. 

In the parametric time series case, estimation and specification testing 
methods have been developed to deal with nonstationarity. In recent years, 
attempts have also been devoted to the estimation of nonlinear and nonsta- 
tionary time series models using nonparametric methods. Existing studies 
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include Phillips and Park (1998) and Karlsen and Tj0stheim (1998, 2001) on 
nonparametric autor egression, Park and Phillips (2001) on parametric non- 
linear regression, Bandi and Phillips (2003) on nonparametric estimation of 
nonstationary diffusion models, Wang and Phillips (2009) on nonparametric 
kernel estimation of random walk processes, and Karlsen, Myklebust and 
Tj0stheim (2007) on nonparametric cointegration. In the original version of 
this paper, Gao et al. (2006) discuss specification testing problems for both 
autoregression and conintegration cases with nonstationarity. 

In the field of model specification testing with nonstationarity, there is a 
huge literature on various unit root tests in the parametric linear autore- 
gressive case. To the best of our knowledge, there seems to be very little 
work on specification testing in the nonparametric nonlinear autoregressive 
case. This paper aims to discuss such issues. Consider a class of nonlinear 
autoregressive models of the form 

(1.1) X t = g(Xt- 1 ) + u t , i = l,2,...,T, 

where g(-) is an unknown function defined over R 1 = (— oo, oo), {m} is a 
sequence of independent and identically distributed i.i.d. errors with mean 
zero and finite variance a\ = E[u1], and T is the number of observations. 
The initial value Xq of Xt may be any O p (l) random variable. However, we 
set Xq = to avoid some unnecessary complications in exposition. 

When g{X t _\) = X t _\ + g\{X t _i) with g±(-) being an identifiable nonlin- 
ear function, model (1.1) becomes a nonlinear random walk model. Granger, 
Inoue and Morin (1997) discuss some parametric cases for this model, and 
suggest several estimation procedures. As g(-) usually represents some kind 
of nonlinear fluctuation in the conditional mean, it would be both theoreti- 
cally and practically useful to test whether such a nonlinear term is signifi- 
cant before using model (1.1) in practice. We therefore propose testing the 
following null hypothesis: 

(1.2) H Q :P(g{X t - 1 )=X t - 1 ) = l for all i > 1. 

The main difference between our approach and existing ones is that we 
need not prespecify g{x) parametrically as g{x) = Ox and then test H :9 = 1 
as has been done in the literature. Our approach is that we test Hq nonpara- 
metrically. In doing so, we can avoid possibly misspecifying the true model 
before using a specification testing procedure. 

The main contributions of this paper are as follows: 

(i) It proposes a nonparametric kernel test for nonlinear nonstationar- 
ity against nonlinear stationarity in model (1.1). This test procedure corre- 
sponds to the well-known test proposed by Dickey and Fuller (1979) for the 
parametric case. 
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(ii) It establishes an asymptotically normal test for testing the condi- 
tional mean in model (1.1) under the null hypothesis. Theoretical properties 
for the proposed test procedure are established. 

(hi) This paper is then concerned with discussing the power function of 
the proposed test under a stationary alternative. Some asymptotic consis- 
tency results under both the null and alternative hypotheses are established. 

(iv) In order to implement the proposed test in practice, we develop a new 
simulation procedure based on the assessment of both the size and power 
functions of the proposed test. 

The rest of the paper is organized as follows. Section 2 establishes a simple 
nonparametric test and an asymptotic distribution under the null hypoth- 
esis. Discussion about the power function of the proposed test is given in 
Section 3. Section 4 shows how to implement the proposed test in practice. 
Section 5 concludes the paper with some remarks on extensions. Mathemat- 
ical details are relegated to the Appendix. Some additional derivations are 
given in Appendices B-E of Gao et al. (2008). 

2. Nonparametric unit root test. Consider model (1.1) and a general 
testing problem of the form 

H : P(g(X t -i) = X t -x) = 1 against 

(2.1) 

Hi : P(g(Xt-!) = Xt-i + A T (X f _x)) = 1, 

where {At(x)} is a sequence of unknown functions. 

Before proposing our test statistic for (2.1), we consider the conventional 
Nadaraya- Watson (NW) kernel estimate of the form 

(2-2) g(x) = £ W T {x,X^)X. = ^ K ^ X ;- 1 ~ X)X " 

8=i J2t=i K h(x t -i-x) 

where W T (x,X s ^) = ,jf h{ ^~ x) in which K h (-) = K(-/h), K(-) is a 

probability kernel function and h is a bandwidth parameter. 

Let A(X t -i,X a -i) = ±J2LiW T (X k - U X t _ 1 )W T (X k _ 1 ,X Szl ) and X t _i = 
^iWr(It-iJii-i)Vi. We then compare g(X t -i) with AVi by 

1 T 

N T {h) = Nt(X±, . . . , A T ; h) = -J2[g(X t -i) - ^-i] 2 

t=i 



8=1 t=l V k = l / 

T T 

A( x t-i,X s -i)utu s , 



u t u s 



-lt=l 
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where ut = Xt — Xt-\ under Hq. Similar forms have been used for the sta- 
tionary time series case [see, e.g., Hjellvik, Yao and Tj0stheim (1998)]. Other 
alternatives to iVr(/i), including the introduction of Mx(h) below, are dis- 
cussed in Gao et al. (2006). 

In theory, we can derive a test statistic based on Nt(}i). As can be seen, 
Nt(}i) involves both a triple summation and a kind of random denominator 
problem, which may cause more difficulty and technicality than those for 
the stationary case. Compared with Mx{h) below, our experience with the 
stationary case also shows that a test statistic based on Nx(h) has less 
attractive properties than those based on My(/t) below [see, e.g., Li (1999), 
Gao and King (2004) and Chapter 3 of Gao (2007)]. 

We thus propose using a test statistic of the form 

T T 

(2.3) M T = M T (h)=Y / E u s K h {X s ^-X t ^)u u 

i=l s=l,s^t 

where ut = Xf — g(Xt-i). We now introduce the following conditions. 

Assumption 2.1. (i) Suppose that {ut} is a sequence of independent 
and identically distributed i.i.d. errors with £?[iti] = and i?[it?] = o\ < oo. 
Let < /x 4 = E[u\] < co. 

(ii) Suppose that {ut} has a symmetric density function f(u). Let f'(u) 
be the first derivative of f(u) and f'(u) be continuous at u E (—00,00). Let 
ip(-) be the characteristic function of {u t } satisfying dv < 00. 

(iii) Let K(-) be a symmetric probability density function. Suppose that 
there are constants c\ > and < C2 < C3 < 00 such that Q2-Z"(|u| < c\) < 
K(u) < csl(\u\ < ci). In addition, suppose that \K(x + y) — K(x)\ < ^{x)\y\ 
for all x G C(K) and any small y, where ^/(x) is nonnegative bounded func- 
tion for all x S C{K) and C(K) denotes the compact support of K{-). 

(iv) Assume that h satisfies lim^^oo T 3 / 10 /i = and lim supy^^ 

T i/2-e 0h = 

00 for all < eo < I • 

Remark 2.1. The i.i.d. assumption in Assumption 2.1(i) is needed to 
ensure that the partial sum St = X)s=i u s nas independent increments, al- 
though, {St} itself is nonstationary and dependent. Under this assumption, 
we are able to establish the main results of this paper in Theorems 2.1, 2.2 
and 3.1 below. Assumption 2.1(h) imposes some mild conditions on both the 
density function and the characteristic function and it holds in many cases. 
The condition |?;| |?/>(v)| dv < 00 is to ensure certain convergence results. 
Let 0t(^c) be the density function of —X — J2t=i u t- Then under Assumption 
2.1(h), 



(2.4) sup \(p T (x) - <j)(x)\^0 and sup |^y(a;) - <t>'{x)\ -> 0, 

X X 
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where 4>' T {x) and (j)'(x) are first derivatives, and 4>{x) = —m=e x2 ^ 2 is the 



2n 

density function of the standard normal random variable N(0, 1). The proof 
of (2.4) is quite standard [Chapters 8 and 9 of Chow and Teicher (1988)]. 

Assumption 2.1(iii) also holds in many cases. For example, when K{x) = 
^I[_i j i](x), Assumption 2.1(iii) holds automatically. In addition, Assump- 
tion 2.1(iv) does not look unnatural in the nonstationary case, although 
it looks more restrictive than that required for the stationary case. In ad- 
dition, the conditions of Theorems 5.1 and 5.2 of Karlsen and Tj0stheim 
(2001) imposed on h become simplified since we are interested in the special 
case of random walk with a tail index of [3 = \ involved in those condi- 
tions. Such conditions on the bandwidth for nonpar ametric testing in the 
nonstationary case are equivalent to the minimal conditions: lim^oo h = 0, 
limj^oo Th = oo and lim^oc, Th 4 = required in nonparametric kernel test- 
ing for the stationary time series cases [see, e.g., Gao (2007)]. 

Let dj> = a\{h) =2J2j=i EjLi s ^t ^s K h(X s -i ~~ ^t-i)^t • As can be seen 
from the proof of Theorem 2.2 below, under Hq we have for the normalized 
test statistic 

(2.5) 



^ y/2YLiTZ=i,.#&sK 2 ((Xt-i-X a - 1 )/h)% 

El=2 E*=i u.*((E$=. + i u 3 + u s )/h)u t 

+ Op(l). 



>E?=2 E*=i ^((E.ti+i u 3 + u s )/h)ul 

In comparison with existing forms for the stationary case [e.g., (34) of 
Arapis and Gao (2006)], establishing an asymptotic distribution for Lx(h) 
becomes nonstandard mainly due to the fact that {X{\ is now nonstationary 
and {u s } is involved in both the argument of K (•) and in a factor multiplying 
K{-). ' 

Let 

2 

CJy - 

Before we study asymptotic properties of Lx(h), we need to evaluate the 
asymptotic order of o\ in Theorem 2.1 below. The proof is given in Lemma 
A.l in Appendix below. 

Theorem 2.1. Consider model (1.1). Assume that Assumption 2.1 holds. 
Then under Hq 

4 = C 10 T 3 / 2 / l (l + o(l)), 
where C\q = 1 > * n which a\ = E[uf\ and Jq2 = / K 2 (x) dx. 




G 



GAO, KING, LU AND TJ0STHEIM 



Note that a\ is proportional to T 3//2 /i. When {Xt} of model (1.1) is 
stationary, however, o\ is proportional to T 2 h as has been given in the 
literature [Gao (2007)]. Theorem 2.2 below shows that standard normality 
can still be the limiting distribution of a test statistic under nonstationarity. 



Theorem 2.2. Consider model (1.1). Suppose that Assumption 2.1 holds. 
Then under Hq and as T — > oo 

£ t( m _ M T(h) _ EHiEJ=i, s ^^(* s -i - Xt^ut 



(2.6) 

Yl=2 Es=i u s K h (X s -i - X t -i)u t _^ N > Q 1 . 
El=2E t s =\n 2 s Kl(X s . 1 -X t ^ 



Proof. Observe that under H 

T T 

t=l s=l,s^t 
T T 

= E E u s K h( X s-l -X t -i)ut 
t=l s=l,s& 

T T 

(2.7) + E E SsKHiX^-Xt-1% 

t=l s=l,s^t 
T T 

+ 2E E UsK^Xs-x-X^St 

t=\ s=l,s^t 

= M T1 + M T2 + M T3 , 

T T 

5f = 2£ E ulKliX^x-X^ul 

t=l s=l,s^t 
T T 



(2.8) =2E E u 2 s Kl{X s ^-X t ^)u 

t=l 8=l,S^t 

T T 

52 ts2 fv v \!& 



+ 2E E M(^-l-^-lW + i?T, 
i=l s=X,s^t 
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where St = Xt-i — g(Xt-i) and Rt is the remainder term given by 

T T 

R T = a 2 T -2Y J E u 2 s Kl{X s ^-X t ^)u 2 t 

t=l s=l,s^t 
T T 

-2E E 

i=l s=l,s^i 

In view of (2.7) and (2.8), to prove Theorem 2.2, it suffices to show that 
as T — > oo 

(2.9) ^i^ DA r(o,i), 

(2.10) ?^^p0 fori = 2, 3, 
(2-11) %^-p0, 



where a 2 T = 2£ t J =1 EU^im**-! - *t-l 

The proof of (2.9) is given in Lemma A. 3 of the Appendix below. In view 
of (2.9), to complete the proof of Theorem 2.2, it suffices to prove (2.10) and 
(2.11). We now give the proof of (2.10) and then an outline of the proof of 

(2.11) . 

It follows from (2.9) that 

(2.12) J- £ E u s K ( Xt - X ~ X '- 1 ) u t = Op(1). 

In order to prove (2.10), we first need to show that 

M T 2 , ^ 

(2.13) — = o P (l). 

Observe that under Hq : X t = X t -\ + ut 
S t = X t - 1 -g(X t „ 1 ) 

T 



(2.14) 



It-i-^^(I t -i,Vi)^ 

s=l 

T T 

Iw-^^Ih,!^)!,-! -J2 w T(Xt-i,X s -x)u s 
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where AVi = - ELi Wt(X^i,X.-i)X s -i and u t = ZLi ^(^t-i, 
X s _i)it a . 

Thus, in order to show (2.13), it suffices to show that 

(2.15) £ £ X.^f ^- 1 " Z - 1 )x t _ 1 = 0f (crr), 

t=l s =l, a # V Al / 

T T 

(2.16) £ £ u s K( ^-' h ^ s - L )u t = o P {a T ). 



'Xt-i — X s -i 
u s j\ y - 

t=l s=l,s^t 



The proof of (2.16) is quite technical and thus relegated to Lemma E.l 
in Appendix E of Gao et al. (2008). Meanwhile, Assumption 2.1(iii) and a 
conventional approach [see, e.g., the proof of Theorem 5.1 of Karlsen and 
Tj0stheim (2001)] imply that uniformly in x, 

T 

g{x) = g(x) - J2 W T (x, X a - X )g{X a - X ) 

s=l 

_ ELi K((x - X s ^)/h){g(x) - g(X s ^)) 



g'(x)h I uK(u) du(l + o P (l)) = o P (h), 



(2.17) 



when g(-) is differentiable and the first derivative, g'(x), is continuous. 

Using (2.17) for the case of g(x) = x, in order to prove (2.15), it suffices 
to show that 

(2.18) tf£ E K( Xt - 1 - h X - 1 )=op{* T ), 



t=l s=l,s^t 

which follows from 

T T 



K 



Xf-i — X s ^i 
h 



0(T 3 / 2 h) 



(2-19) E E E 

t=l s=l,s^t 

and Assumption 2.1(iv). The verification of (2.19) is similar to but simpler 
than that of (A. 3) below. 

Hence, (2.13) and (A. 50) in the Appendix below imply 

(2.20) Mn = Mn^ = 0p{1) . 

This proves (2.10) for i = 2. Furthermore, the proof of (2.10) for i = 3 
follows from (2.12)-(2.20) and 
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t=l s=l,8 \ n / t=ls=l s ^ t \ n / 

= P (a T ) ■ o P (a T ) = op{ar), 

where 5 t = X t -\ - g(X t -i). 

In view of the definitions of dp, o\, (2-17) above, (A. 50) in the Appendix 
and 

^2 ~2 ^2 ~2 2 

(Jrp (Jrp (Jrp (J p (Jrp 

~2 2 ' ~2 ' 

(Jrp (J rp (Tp 

in order to prove (2.11), it suffices to show that 



(2.21) £ £ ^ t"\""' )^ = M4). 



('Xs-1 — X t -\ \ 

t=l s=l,s^t 



The verification of (2.21) is similar to that of (2.16). This completes the 
proof of Theorem 2.2. □ 

Existing studies of test statistics analogous to Lt(K) for the stationary 
time series case show that the size function of the test is not well approx- 
imated using a normal limit distribution. The main reasons are as follows: 
(a) the rate of convergence of each Lx{h) to asymptotic normality is quite 
slow even when {ut} is a sequence of independent and identically distributed 
errors; and (b) the use of a single bandwidth based on an optimal estimation 
criterion may not be optimal for testing purposes. 

In order to improve the finite sample performance of Lx{h), we propose 
using a bootstrap simulation method. Such a method is known to work 
quite well in the stationary case. For each given bandwidth satisfying cer- 
tain theoretical conditions, instead of using an asymptotic critical value of 
^0.05 = 1-645 at the 5% level for example, we use a simulated critical value 
for computing the size function and then the power function. An optimal 
bandwidth is chosen such that while the size function is controlled by a sig- 
nificance level, the power function is maximized at the optimal bandwidth. 
Our finite-sample studies show that there is little size distortion when using 
such a simulated critical value. These issues are discussed in Section 3 below. 



3. Bootstrap simulation and asymptotic theory. In order to assess the 
performance of both the size and power function, we need to discuss how 
to simulate critical values for the implementation of Lx{h) in each case. We 
then examine the finite sample performance through using two examples in 
Section 4 below. Before we look at how to implement Lt{H) in practice, we 
propose the following simulation scheme. 
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Simulation scheme: the exact a-level critical value, l a (h) (0 < a < 1), is 
the 1 — a quantile of the exact finite-sample distribution of Lt(K). Because 
there are unknown quantities, such as unknown parameters and functions, 
we cannot evaluate l a (h) in practice. We propose choosing an approximate 
a-level critical value, l* a (h), by using the following simulation procedure: 

• Let Xq = 0. For each t = 1,2, ... ,T, generate X* = X t -\ + o u e%, where 
o\ is a consistent estimator of a\ = E[uf\ based on the original sample 
(X\,X2, ■ ■ ■ ,Xt), and {e^} is constructed using either a parametric boot- 
strap method or a nonparametric bootstrap method. 

• Use the data set {X% :t = 1,2, ... ,T} to re-estimate <?(•) by g*(x) = 
Y%=i WtX (x,X s -i)X*. Let u* t = X% —g*(X t -i). Compute the test statis- 
tic L^(h) that is the corresponding version of Lx(h) by replacing ut with 
ul on the right-hand side of Lx(h). 

• Repeat the above steps M times and produce M versions of L^(h) denoted 
by L^ m (h) for m = 1, 2, . . . , M. Use the M values of L^ m (h) to construct 
their empirical bootstrap distribution function. The bootstrap distribu- 
tion of L* T (h) given X T = {X t : 1 < t < T} is defined by P*(L* T (h) < x) = 
P(L* T (h)<x\X T ). Let l* a (h) satisfy 

P*(L* T (h)>l* a (h)) = a 

and then estimate l a (h) by l* a (h). 

• Define the size and power functions by 

a(h)=P(L T (h)>l* a (h)\H ) and 0(h) = P(L T (h)>l* a (h)\H x ). 

Let 7i = {h: a(h) < a}. Choose an optimal bandwidth h tcs t such that 

htest = argmax/?(/i). 
hen 

We then use ^(/itest) in the computation of both the size and power 
values of Lt (htest) for each case. 

To study the power function of Lt(K), we specify a sequence of alterna- 
tives of the form: 

(3.1) Fx : P(g(X t _ 1 ) = X t -i + A T (AV0) = 1, 

where Ar(i) is a sequence of nonparametrically unknown functions satisfy- 
ing certain conditions in Assumption 3.2 below. 
Under Hi, model (1.1) becomes 



(3.2) 



X t = g(X t -i) +u t = AVi + A T (AVi) + u t , 
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where At(x) can be consistently estimated by 

~ . ELx^ (*t-i-*)0Xt-x t _ n 

(3.3) A r (x) = - 



with /icy being chosen by a conventional cross-validation selection method. 
To establish Theorem 3.1 below, we need the following conditions. 

Assumption 3.1. (i) Assumption 2.1 holds. 

(ii) Suppose that g(x) is differentiable in x G R 1 = (— oo, oo) and that the 
first derivative g'(x) is continuous inxSfi 1 . In addition, is chosen such 
that {Xt} of (1.1) under H\ is strictly stationary. 

Assumption 3.2. Let f(x) be the marginal density function of {X t } 
under H\. Suppose that {At{x)} is either an unknown function of the form 
A(x) or a sequence of unknown functions satisfying 

(3.4) lim T 5 / A Vh5 2 (T) = oo where 5 2 (T) = [ A 2 T {x)f 2 {x) dx. 

Since g{x) is not necessarily identical to x under Hi, Assumption 3.1(h) 
requires that the main interest of this paper is to test linear nonstationarity 
against nonlinear stationarity. Some secondary conditions on the form of 
g(-) such that {Xt} is strictly stationary under Hi are available from Masry 
and Tj0stheim (1995). 

Assumption 3.2 basically requires that there is some "distance" between 
g(Xt-i) and X t -i when Hq is not true. Obviously, there are many differ- 
ent ways of choosing Aj-(x) for H\. For example, we may consider testing 
nonstationarity against stationarity of the form 

H :X t = X t -i + ut versus 

(3.5) 

H 1 :X t = <?(AVi) + u t = X t -! + A(AVi) + u t , 

where {u t } is a sequence of i.i.d. errors with E[u\] = and E[u 2 ] = a 2 < oo, 
and A(-) can be either a nonparametric or semiparametric function and is 
chosen such that {Xt} is stationary under H\. In this case, we have 

1 T T 

(3-6) ^-EE^ 



[ X*- 1 h Xt ~ l ) 



T 2 h 

t=l s=l 

= (l + o(l)) j A 2 (x)f(x)dx>0, 

since {Xt} under Hi is strictly stationary with /(•) being its marginal den- 
sity function. This, along with Assumption 2.1(iv), implies that Assumption 
3.2 holds when A T (x) = A(x). 

We now state the following results and their proofs are given below. 
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Theorem 3.1. (i) Assume that Assumption 3.1 holds. Then under Hq 

lim P{L T {h) > l* a ) = a. 

T — >oc 

(ii) Assume that Assumptions 3.1 and 3.2 hold. Then under H\ 

lim P(L T (h)>l* a ) = l. 

1 — >oc 

Theorems 3. 1 (i) implies that each Z* is an asymptotically correct a- 
level critical value under Hq, while Theorem 3.1(h) shows that L^{K) is 
asymptotically consistent against alternatives of the form (3.1) whenever 
S(T) > CT" 5 / 8 /!" 1 / 4 for some finite C > in this kind of nonparametric 
testing of nonstationarity against stationarity. 

Proof of Theorem 3.1. Recall g*{x) = Y%=i Wt(x,X s -i)X* and 
u* = XI - g*(X t _i). Let St = X t _! - ff*(X t _i). We now have 

T T 

M^(h) = j2 E ^^(^-i-it-iK 

t=l s=l,s^t 
T T 

= E E °u£* s Kh{ X s-i - X t -i)a u e* t 

t=l s=l,s^t 

(3-7) + E E P.KhiX^-Xt-xffi 

t=l s=l,s^t 
T T 

+ 2 E E ^^(^-i-^ t -iK 

= Mn + MJ. 2 + M£ 3 . 

Using Assumptions 2.1 and 3.1, in view of the notation of L^(/i) intro- 
duced in the simulation scheme proposed just above Assumption 3.1 as well 
as the proof of Theorem 2.2, we can show that as T — > oo 

(3.8) P*(L* T (h) <x) ->$(s) for all a? € (-00,00) 

holds in probability with respect to the distribution of the original sample 
{Xt-i : 1 < t < T}, where $(•) is the distribution function of the standard 
normal random variable N(0, 1). In order to prove (3.8), in view of the fact 
that {e*} and {X{\ are independent for all s,t > 1, we can show that the 
proofs of Lemmas A.l and A.3-A.6 below all remain true by successive 
conditioning arguments. 
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Let z a be the 1 — a quantile of $(•) such that $>(z a ) = 1 — a. Then it 
follows from (3.8) that as T — > oo 

(3.9) P*(L* T {h)>z a )^l-<5>{z a ) = a. 

This, together with the construction that P*(L T (h) > l* a {h)) = a, implies 
that as T — > oo 

(3.10) l* a (h)-z a ^ P 0. 

Using the conclusion of Theorem 2.2 and (3.8) again, we have that as 
T — > oo 

(3.11) P*(L* T (h)<x)- P(L T (h)<x)^ P for all x G (— oo, oo). 
This, along with the construction that P*(L T (h) > l^{h)) = a again, implies 

(3.12) lim P{L T {h) >l* a (h)) = a. 

T — >oo 

Therefore, the conclusion of Theorem 3. 1 (i) is proved. 

Recall u t = X t — g{X t -i) and let At = X t -\ — g(X t ~i). To prove Theo- 
rem 3.1(h), we need to recall the decomposition of Mt(1i) in (2.7). Recalling 
5 t = Xt-i - g(Xt-i) and X t = Xt-i - g(Xt-i), we have 

S t = AVx - g(X t _i) = Xu.! - g(X t ^) + g(X t ^) - g{X t -i) 

T 

= AVi - g(X t ^) + g{X t ^) - W r (X t _i,X s _i)(/(X a _i) 

s=l 

T 

-^WriXt-^Xs-jus = X t +g(X t -i) -u t , 

8=1 

where g(X t _i) = g{X t -i) - Ej=i W T (X t ^ 1 ,X s ^ 1 )g(X s ^ 1 ). In view of the 
proof of Theorem 2.2, (2.15)-(2.17) in particular as well as (3.10), in order 
to prove Theorem 3.1(h), it suffices to show that under Hi 

(3.13) SLi E*=i XsKhjXs^ - X t ^)X t ^ 

(TT 

Similarly to (3.6), we have under Hi 

^ E EE\(jg(X..i) - X^K^ X- 1 -**- 1 ) (g(X t _i) - X t _i] 



, ^ ° T t=l8=l 

(3-14) 

T 2 h 



■(i + o(i)) / A 2 T (x)f(x)dx = CT 5 / 4 Vh5 2 (T)(i + o(i)). 
or J 

The verihcation of (3.13) follows from (3.14) and Assumption 3.2. This fin- 
ishes the proof of Theorem 3.1. □ 
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Section 4 below shows how to illustrate Theorem 3.1 through using a 
simulated example and then a real data application. 

4. Examples of implementation. This section studies some finite-sample 
properties of both the size and power functions of the proposed test through 
using two examples. Example 4.1 assesses the finite-sample performance us- 
ing simulated data. A real data application is given in Example 4.2. Through- 
out Examples 4.1 and 4.2 below, we use K(x) = |ir_i i](a:). 

Example 4.1. Consider a nonlinear time series model of the form 
(4.1) X t = X t -i+A(X t -i)+ut, 

where Xq = 0, {ut} is a sequence of independent normal random errors with 
E[ui] = and E[uf] = a\ < oo, and A(x) is chosen as a known parametric 
function with some unknown parameters in the following data generating 
process. 

We then consider two different cases as follows: 
H :X t = AVi + u t versus 

(4.2) 

H 1 :X t = X t - 1 +0Xt-i+Ut 

and 

H :X t = X t -\ + u t versus 

(4.3) 

H 1 :X t = X t -! + (iX t -x + t—£ |, + u t , 

1 + |At_i| ' 

where 0<7<oo,— 2</3<0 and < u u < oo are unknown parameters to be 
estimated using the conventional MLE method [see Granger and Terasvirta 
(1993)]. 

Since we are interested in assessing the performance of the proposed test 
for a number of different values for /?, the fixed values of a\ = 0.05 and 
7 = 2 are used in generating the data. In addition to the case of a\ = 0.05, 
we have also tried some other values of a u . As our preliminary results show 
that the resulting finite sample results are very similar, we focus on the case 
of a\ = 0.05 in this example. 

Note that {Xt} of (4.2) is nonstationary under Hq, while it strictly sta- 
tionary and a-mixing under Hi with < 7 < 00, and —2 < j3 < in both 
cases. With the choice of the values for (3 and 7, the time series {X t } of (4.3) 
is also strictly stationary under Hi [see, e.g., Masry and Tjostheim (1995)]. 
In the simulation, we consider various values of —2 < j3 < when computing 
the power of Lx(h). 
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As pointed out in the literature for the i.i.d. and stationary time series 
cases [Hjellvik, Yao and Tj0stheim (1998), Li and Wang (1998), Fan and 
Linton (2003), Gao (2007) and Gao and Gijbels (2008)], the choice of a 
kernel bandwidth for testing purposes is quite critical and difficult. In the 
nonstationary case, however, how to choose an optimal bandwidth parameter 
is still an open problem. 

Thus, in the finite-sample study, we apply the first part of the simulation 
scheme proposed in Section 3 to simulate a bootstrap critical function l&(h) 
for each given h in each individual case. We then choose an optimal value for 
h in each case such that the power function is maximized at such an optimal 
htest- For each case of T = 250, 500 or 750, the finite-sample assessment of 
the corresponding size and power functions suggests choosing Attest = 0.160 
when T = 250, 0.117 for T = 500 and 0.097 when T = 750. 

To assess the variability of both the size and power with respect to various 
bandwidth values, we then consider a set of bandwidth values of the form 



1 

2^ 



hi — „ e ■ ht es t 



for 1 < i < 5 with L5 = (^test )• To simplify the notation, we introduce 
Li = Lxihi) for 1 < i < 5. Since the alternative of model (4.2) is a linear 
form, we may compare our test with a version of the Dickey-Fuller test of 
the form [Dickey and Fuller (1979)] 

(4.4) Lo= El2(Xt-X t ^)X t ^ 



where 3* = * El=i(X t - X t -\ - M-i) 2 with = ^g^ . 

In the following tables, we consider cases where the number of replications 
of each of the sample versions of the size and power functions was M = 1000, 
each with B = 250 number of bootstrapping resamples {e^} (involved in the 
simulation scheme in Section 3 above) from the standard normal distribution 
N(0, 1), and the simulations were done for the cases of T = 250, 500 and 750. 

Table 1 shows that while the sizes are comparable, the conventional test Lq 
is more powerful than the proposed test L5 as expected when the alternative 
model is a linear autoregressive model. However, the biggest power reduction 
is only about 36% in the case of T = 250 and f3 = —0.05. This may suggest 
that we should use the proposed test for nonstationarity in the conditional 
mean when there is no priori information about the form of the conditional 
mean. 

When the alternative is a nonlinear parametric form as in (4.3), our stud- 
ies show that Lq is basically inferior to our test in the sense that it is much 
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Table 1 

Simulated sizes and power values at the 5% level 





T = 


250 


T 


= 500 


T = 


750 




L 


L 5 


L 


£5 


L 


L 5 


0.00 


0.037 


0.041 


0.059 


0.039 


0.054 


0.051 


-0.05 


0.718 


0.464 


1.000 


0.679 


1.000 


0.804 


-0.10 


0.999 


0.811 


1.000 


0.966 


1.000 


0.986 


-0.20 


1.000 


0.993 


1.000 


1.000 


1.000 


1.000 


less powerful than the proposed test. We now give the corresponding simu- 


lated 


sizes and power values with 1000 replications for mo 


del (4.3) for both 


of the tests in Tables 2-5 below. 










The finite-sample 


results given 


in Tables 2-5 show that the proposed test 


and the simulation scheme work well numerically. Table 2 lists the 


sizes for 








Table 2 












Simulated 


sizes at the 5% level 






T 




L 2 


L 3 


£4 




L 


250 


0.003 


0.010 


0.034 


0.047 


0.039 


0.038 


500 


0.007 


0.017 


0.026 


0.041 


0.037 


0.061 


750 


0.005 


0.014 


0.038 


0.050 


0.049 


0.056 








Table 3 












Power values for T — 250 


at the 5% level 






P 


£i 


L 2 


L 3 


L 4 


£5 


L 


-0.05 


0.095 


0.112 


0.129 


0.141 


0.207 


0.087 


-0.10 


0.206 


0.268 


0.350 


0.438 


0.647 


0.127 


-0.20 


0.566 


0.726 


0.881 


0.972 


0.998 


0.421 


-0.40 


0.984 


0.999 


1.000 


1.000 


1.000 


0.678 








Table 4 












Power values for T — 500 


at the 5% level 






P 


£i 


L 2 


L 3 


L 4 


L 5 


L 


-0.05 


0.160 


0.202 


0.249 


0.323 


0.477 


0.097 


-0.10 


0.432 


0.568 


0.746 


0.889 


0.982 


0.231 


-0.20 


0.923 


0.993 


1.000 


1.000 


1.000 


0.519 


-0.40 


1.000 


1.000 


1.000 


1.000 


1.000 


0.754 
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Table 5 

Power values for T — 750 at the 5% level 








Li 


L s 


£ 4 


L 5 


Lo 


-0.05 


0.279 


0.280 


0.358 


0.461 


0.694 


0.121 


-0.10 


0.663 


0.753 


0.905 


0.977 


0.999 


0.398 


-0.20 


0.992 


0.999 


1.000 


1.000 


1.000 


0.689 


-0.40 


1.000 


1.000 


1.000 


1.000 


1.000 


0.842 



Li for 1 < i < 5 and Lo- While the sizes are relatively low for L5 in the cases 
of T = 250 and T = 500, the size function approaches 5% when T is as large 
as 750. Most importantly, with such choices of the simulated critical values, 
Tables 3-5 show that the proposed test is powerful for nonstationarity ver- 
sus stationarity. For example, when the "distance" between nonstationarity 
and stationarity is as small as for (3 = 0.05, the maximum of the power for 
T = 250 at the 5% level is already over 20%. Comparing the power val- 
ues of Lq with these values of Li, 1 < i < 5, our observation is that the 
Dickey-Fuller test is inferior for the case where the alternative is nonlinear. 
This further supports proposing a test for dealing with such nonparametric 
nonstationarity. 

As Tables 2-5 show, the corresponding power value of L4 in each case 
is only the second best among Li for 1 < i < 5 if we choose an optimal 
bandwidth such that the simulated size is the closest to 5%. Thus, our finite 
sample studies also support the fact that there is a kind of trade-off between 
sizes and power values. 

Example 4.2. This example examines the three month Treasury Bill 
rate data given in Figure 1 below sampled monthly over the period from 
January 1963 to December 1998, providing 432 observations. 

Let {Xt : t = 1, 2, . . . , 432} be the set of treasury Bill rate data. As Figure 
1 does not suggest that there is any significant trend for the data set, it is 
not unreasonable to assume that {Xt} satisfies a nonlinear autoregressive 
model of the form 

(4.5) X t = g(X t ^) + e t 

with the form of g(-) being unknown. 

To apply the test .£/r(^test) to determine whether {Xt} follows a random 
walk model of the form Xt = Xt-\ + ut, we need to propose the following 
procedure for computing the p- value of LT(/i test ): 

• For the real data set, compute ht es t and L^/itcst)- 
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• Let X* = X\. Generate a sequence of bootstrap resamples {e* t } from 
N(0, 1) and then X? = X t -i + a u e\ for 2 < t < 432. 

• Compute the corresponding version L^(ht es t) of Lt based on {A^*}. 

• Repeat the above steps M times to find the bootstrap distribution of 
Ly(/i te st) and then compute the proportion that LT(ht es t) < L^.(ht es t)- 
This proportion is an approximate p- value of -L*r(^test)- 

Our simulation results return the simulated p- values of p\ = 0.005 for Lq 
and f>2 = 0.011 for Lr(/i tcs t)- While both of the simulated p- values suggest 
that there is not enough evidence to accept the unit-root structure at the 5% 
significance level, there is some evidence of accepting the unit-root structure 
based on L^^itest) at the 1% significance level. When we also generated } 
from a non-Gaussian distribution, the simulated p-values were quite close. 
By comparison, Jiang (1998) rejects the null hypothesis of nonstationarity 
on the Fed data based on an application of an augmented Dickey-Fuller 
unit-root test for Hq :9 = 1 in a linear model of the form X t = 9X t _\ + e^. 

5. Conclusion and extensions. We have proposed a nonparametric spec- 
ification test for testing whether there is a kind of unit root structure in a 
nonlinear autoregressive mean function. An asymptotic normal distribution 
of the proposed test has been established. In addition, we have also pro- 
posed a simulation scheme to implement the proposed test in practice. The 
finite-sample results show that both the proposed test and the simulation 
scheme are practically applicable and implementable. 

It is pointed out that we may also consider a generalized form of model 
(1.1) with <j u replaced by a stochastic volatility function a(Xt-i). In this 




1965 1972 1980 1988 1995 

Year 

Fig. 1. 
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case, we should be considering a test for 

(5.1) Hoi : P{g{X t -i) = and a(X t -i) = <J U ) = 1. 
In this case, we may use a kernel-based test of the form 

T T 

(5.2) S T (h)=Y, E (U s K hl (X s -X t )U t + V s G h2 (X s -X t )V t ), 

t=l s=l,S^t 

where Gh 2 {~) = G{-/h,2) with G(-) being a probability kernel function, h = 
(hi,fi2) is a pair of bandwidth parameters, Ut = Yt — g{Xt-i) and Vt = 
Uf — a\ and a u is an estimator of o~ u under Hq. Similarly, to Theorems 2.2 
and 3.1, we may establish two corresponding theorems for 5^(/i). As the 
details for this case are lengthy and technical, we leave this issue for future 
study. 

Another possible extension will be on the multivariate case where a mul- 
tivariate autoregressive model is given as follows: 

(5.3) X t =g(Xt- 1 ,...,X t - p ) + e t . 

In this case, we are interested in testing a null hypothesis of the form 

(5.4) H 02 : P \g{X t -u • ■ • , X t - P ) = J2 e i X t-)j = h 

in which there is at least one unit root of the corresponding characteristic 
polynomial. Detailed construction of such a test would involve some esti- 
mation procedures for additive models as used in Gao, Lu and Tj0stheim 
(2006) in the stationary spatial case and as proposed by Gao (2007) in the 
stationary time series case. Since such an extension is not straightforward, 
we also leave it as a future topic. 

APPENDIX 

This appendix provides the proofs for some necessary technical lemmas 
that are needed to complete the proofs of Theorems 2.1 and 2.2. Some 
additional details are given in Appendices B-E of Gao et al. (2008). 

v^t— l 

Let a st = K h (J2lzl m) = K( ^ Ui ) and r) t = 2£*= 1 i a st u s for t > s. Note 
that we assume without loss of generality that o~\ = E[uf\ = 1 in this ap- 
pendix. 

Lemma A.l. Under the conditions of Theorem 2.1, we have under Hq 

4 = GVT 3 / 2 / l (l + o(l)) 
for T large enough, where C\q = |^|= with J02 = / K 2 (x)dx. 
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Proof. Recall a st = K h (J2tZ s Ui) = K{ 
It follows under Hq that 

r 12 



and rj t = 2J2 t s= \a st u s . 



<4 = E 



(A.l) 



Li=2 

T t-1 T t-1 

^^^E[a 2 st u 2 s u 2 t ] +4^ E[a Slt a S2t u Sl u S2 u 2 } 

t=2s=l t=2 si ^s 2 =l 
T t-1 



t=2s=X 



where R T = 4£ i= 2 E Sl7 ^ 2= i -EKita S2 tti sl u S2 ]. 

Let ti s t = Ei=s+i M i- Assumption 2.1 (i) , (ii) already assumes that {-Uj} is 
a sequence of independent and identically distributed random variables and 
has a symmetric probability density function. 

Let f(x) and f s t(x) be the density functions of ttj and u st , respectively, and 
s t(a;) be the density functions of V st = ^=j- Clearly, f st (x) = g st (-jJ£==) x 
/ f _ s _ 1 1 and by utilizing the usual normal approximation of V^t — >d -^(0, 1) 
as t — s — > oo under the conventional central limit theorem conditions, it 
follows that as t — s — > oo, g s t(x) — > and gat( v / t ^ s _ 1 ) — ► Co uniformly in 

s, where = -i=exp{-4}, and C = 0(0) = 
Thus, for t — s large enough, we have 



E[a 2 st u 2 ' 



K\{ust + u s )u 2 f(u s )f st (u st ) du s du 



h 



(A.2) 



K 2 {y)x 2 f{x)f st (hy-x)dxdy 
fc(l + o(l)) | y K 2 (y)x 2 f(x)f st {x)dxdy 
h(l + o(l)) JjK 2 (y)x 2 f(x) 

X Sst 



1 



fc(l + o(l)) 



jK 2 (y)dy 
y/t-8-1 
jK 2 (y)dy 



y/t-S-lJ y/t-S-1 

x 2 f{x) g s t 



dx dy 



Vt 



dx 



= h(l + o(l))C 

V* — s — 1 

where the fact that / x 2 f(x) dx = E[u 2 ] = 1 is used. 
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Choose some positive integer Ft > 1 such that IV — ► oo and -JJ=- — > as 
T — ► oo. Observe that 

EE E M = E E W=^+^. 

t=2s=l 8=1 4=8+1 

where A 1T = Y,J=i Ei<(4-8)<r T = 0(2T r ) = otT 3 / 2 /!) using the 

fact that E^a^u 2 ] < /cg-Efit 2 ] = fcg due to the boundedness of the kernel K(-) 
by a constant &o > 0. 

And it follows from (A. 2) that 

8=1 r T +l<(4-s)<T-l 

= 4J^)d, Cor3/2/t(1 + Q(1)) 
It can then be seen that for T large enough 
(A.3) f E = 4J f!^ dy T^h(l + o(l)). 



4=2 8=1 3 v2vr 
To deal with Rt, we need to introduce the following notation: for 1 < i < 2, 

t-l 81-1 

(A.4) Zi = u Si , Zn= E ^22= E n i' 

i=Sl+l j=S2+l 

ignoring the notational involvement of s, t and others. 

Let fn( x u) an d gu(xu) be the probability density functions of Za and 
-^f, respectively, with cr^ = t — — 1 and cr 2 2 = si — S2 — 1. 

Clearly, /^(cc) is symmetric due to the symmetry of f(x). Note that 
h(x) and f u (x)= ! fi i (%)^. 

By utilizing the normal approximation of ^ — >£> iV(0, 1) as cr^ — ► oo un- 
der the usual central limit theorem conditions, it follows that gu(x) — > </>(x) 
and ffii(^r) -> C , with C = and -fl^x) ->• ^'(x) = -<j>{x) leading to 
^9ii(^-)^ -HO) = -Co, as an - oo. 

Similarly to (A. 2), we can derive that as = t — s\ — 1— > oo and cr 2 2 = 
si — s 2 — 1 — ► 00, 

E[a Slt a S2t u Sl u S2 ] 



E 



^(x>We 



Uj \u Sl u S2 



\l=8l / \J=S 2 / 

^z^i^i + z ll )K h {z l + z 2 + z u + z 22 )] 
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(A.5) 



E 



\7=1 



i=l 



J ■ ■ -J xxx 2 K h (xx + x 2 +Xu + X 22 )K h (x 1 + X U ) 

x f(xi)f(x2}fu(xii)f 22(^22) dxi dx 2 dx 11 dx 22 



3=1 



usmg ya 



h 



fusing Taylor expansions and J Xjf(xj)fjj(xj)dxj = 

due to symmetry of / and gjj 

J J Vjj K .'A'j / ( x j ) f'jj ( x j ) dx j dVjj 

J yjj K \J2yii) <%v • j x jf{ x j)f'jj{ x j)dx j 



fc 4 (i+ (i))n 

3=1 

/» 4 (i+ (i))n 

3=1 



2 r 



n 



n?=i 



g n (A> 4 (i + o(i)) 



n 



3=1^33 3=1 
2 



a 



33 



- 77 . 



'33 



C 11 (K)/ l 4 (l + o(l)) 



2tt 



n 



2 A3 



(V^T)3 (ViT^2") 3: 



where the conventional notation H defines Ili=iPj = PiP2 ' "Pk, an d 

en ( = n (/ (e 

= // yiiy22K ( yn } K ( yu + 2/22 ) rfyn rfy22 = ~ / y 2R (y) dy - 
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r 2 

Choose IV satisfying Ft — > oo and ^7=^ -> as T -> 00. Note that 

T t-1 

E E[a Slt a S2t u Sl u S2 ] = A 3T + A 4T + A 5T + A 6T , 

t=2 Sl ^S2=l 

where A 3T = E^i E^S+i ffiSi £[a, lt a. 3t u Sl it. a ] = 0(71* ) = o(T 3 / 2 /i) 
owing to £'[a Sl ta S2t ti sl n S2 ] < k 2 E\u Sl u S2 \ < k$ by the assumption that K(-) 
is bounded by k$: 

T-2 s 2 +F T T 

A AT = E E E E[a Slt a S2t u Sl u S2 ] 

S2=l Sl=S2 + l i=si+IY+l 
T-2 S2+IY T 

<E E E (^[< t <]) 1/2 (^K t <]) 1/2 

s 2 =i si=s 2 +i t=si+r T +i 

T-2 s 2 +V T T 

= o(i)E E E [/^-*i-i) 1/2 ] 1/2 

s 2 =l si=s 2 +l i=si+r T +l 
= 0(r T T 1 + 1 /4/ l l/2) =0 ( T 3/2 /l) _ 

Similarly to A^t, we have 

T-2 T t=si+r T 

At=E E E E[a slt a S2t u sl u S2 ] = o(T 3/2 h). 

S2 = lsi=S2+r T + l t=Sl+l 



Finally, owing to (A. 5), 

T-2 T t=T 



A 5T = E E E S[a Sl ta S2 tit sl 

s 2 =i si=s 2 +r T +i <=si+r T +i 



T-2 T T-1 -, , 

s fe Sl=S2 rr T+ i t=sl ^ T+ i (^/^) 3 ( v^T^) 3 

= 0(T 2 / l 4 ) = o(T 3 / 2 / l ). 

Thus, for T large enough 

T t-1 T-2 T-1 T 

E E ^[asit^i'Wsi^] = 2 E E E E[a slt a S2t u sl u S2 ] 

t=2 s 1 =^s 2 =l S2=l si=s 2 +l t=s±+l 

(A.6) 

= o(T 3 / 2 h) 

using Assumption 2.1. 
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Therefore, (A. 3) and (A. 6) show that as T — > oo 
(A.7) 4= 16J^)^ r3/2/i(1+o(1)) 



The proof of Lemma A.l is therefore finished. □ 

Lemma A. 2. Assume that the probability space {Sl n ,J- n ,P n ) supports 
square integrable random variables S Uj i, S nj 2, ■ ■ ■ , <Sn,fc„> and that the S n ^ are 
adapted to a -algebras T n ±, 1 < t < k n , where 

T n .\ C F n ,2 C • • • C T n ,k n C T n . 

Let X n j = S ni t - S n ,t-i, S n fi = and U^ t = Yfs=i X n,s- J f Gn is a sub-a- 
algebra of ' T n , let Q n>t = T n ,t V G n (the a -algebra generated by T n ,t U Q n ) o,nd 
let Q n ,o = \S^ni<t>} denote the trivial a-algebra. Moreover, suppose that 

n 

(A.8) J2 E ( x nA\x n ,t\>6}\g n ,t-i)^po 

t=l 

for some 5 > 0, and there exists a Q n -measurable random variable u\, such 
that 

(A.9) Ul kn -u 2 n ^ P 0, 

n 

(A.10) Yl E ( X n,t\Sn,t-l)^pO, 

t=l 

n 

(A.ll) J2\ E ( X n,t\Gn,t-l)\ 2 ^pO- 

t=l 

If 

(A.12) lirn Em inf P{U n , kn >6} = 1, 



then as oo 



§**=-- D JV(0,l). 



Proof. The proof follows from Corollary 3.1 and Theorem 3.4 of Hall 
and Heyde (1980). □ 

Lemma A. 3. Under the conditions of Theorem 2.2, we have as T— > oo 
(A.13) ^1^ d at(o,1). 

0~T 
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Proof. We apply Lemma A. 2 to prove Lemma A. 3. Let Yrt = 
£It,s = &{Yrt ■ 1 < t < s} be a a- field generated by {Yrt '■ 1 < i < s}, Gt = 
^t.p(t) an< i $T,s be defined by 

(A 14) Q T -J°t,p(t), l<s<P(F), 

(A. 14 j Wt,s -| 0tsj P(T) + 1 < s < T, 

P(T) 



^p(t) = a ^ m - > where 4 = 2 Ef=i E?=i, s ^ u ? and CT I = var Ef=2 



where P(T) > 1 is chosen such that P(T) — > oo and ^ - — ► as T — > oo. Let 

P(T) 

for all 1 < 5 1 < T as defined before. 

In view of Lemma A. 2 above, in order to prove that as T — > oo 

(A.15) M?± = ^J2 7]tUt ^ D N(0,l), 

it suffices to show that for all 5 > 0, 
T 

(A.16) J2 E i Y Tth[Y Tt \>5\}\ n T,t-l] ^PO, 
i=2 

(A.1T) pQ) 



a 



2 U P{T) 
T 



(A.18) 



(A.19) 



T P(T) T 

Y,E\Y Tt \g T>t -i] =Y^Y Tt + ]T E[Y Tt \n T ,t-i] 

t=2 t=2 t=P{T)+l 

P(T) 

= E Y Tt o, 

t=2 

T P(T) T 

J2 |£^T t |£r, t -i]| 2 = E YS t + ]T l^[^|0 Tl t-i] 

t=2 t=2 t=P(T)+l 

P(T) 

= E y A-po, 

t=2 



(A.20) limlim inf p(—>5]=l. 

The proof of (A.18) is similar to that of (A.19), which follows from 

(A.21) E^[^]=o((^) 3/2 ) -0 

as T — > oo, in which Lemma A.l has been used. 
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In order to prove (A. 16), it suffices to show that 

(A.22) -V2>fo?H°- 

G T t =2 

The proof of (A.22) is given in Lemma A. 4 below. The proof of (A. 17) is 
given in Lemma A. 5 below. 

The proof of (A. 20) follows from 

~2 

(A.23) — i- — >d £ 2 > 

a j, 

for some random variable £ 2 . The proof is given in Lemma A. 6 below. □ 
Lemma A. 4. Under the conditions of Theorem 2.2, we have 

(A.24) Km \j2E[4\ = Q. 

PROOF. Observe that 

t-l t-i t-l t-i 



(A.25) E[r]f] = ^Y Y Y Y E[a Slt a S2t a S3t a S4t u Sl 



^S3 ^S4\ 



81=1 82=1 S3=l S4 = l 



We mainly consider the cases of s, ^ Sj for all i ^ j in the following proof. 
Since the other terms involve at most triple summations, we may deal with 
such terms similarly. Without loss of generality, we only look at the case of 
1 < S4 < S3 < S2 < s\ < t — 1 in the following evaluation. Let 



t-l t-i 



Y U i = U si + Y Ui, 
i=s± i=si+l 

t-l si-l t-l 

Y Ui = U S1 + U S2 + Y Ui+ Y u v 
i=s 2 i=s 2 +l j=s\+l 

t-l 82-1 si-l t-l 

^2 u i = u si + + u s 3 + Y Uk + Y Ui + Y u i' 

i=ss k=ss+l i=S2+l j=8i+l 

t—1 S3 — I S'z — l si — 1 t—1 

Y u i = u si + u S2 +u S3 +u S4 + y u i+ Y Uk+ Y Ui+ Y u r 

i=S4 1=S4+1 k=S3+l i=S2+l j=si+l 

Similarly to (A. 4), let again Zi = u Si for 1 < i < 4, 

t-l Sl-l 



Z\\ = Y U i> ^22 = Y 

i=Sl+l j=S2+l 



U 



3i 
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S2 — 1 S3— 1 

^33 = Ufc > ^ 44 = X! Ul - 

fc=S3 + l i=S4+l 



By the same arguments as in the proof of (A. 5), we have 



E 



n a ^u Sl 



.i=i 



E 



3=1 



<i=l 



P (/ Kk i^ 1 ^ + XU ^j X 3f( X 3)fj3( X 3j) dx 3 dx 33 



using yu 



%i H~~ %i 



+^ 4 n 



(A.26) 



J K yuj x jf( x j)fjj( x j - h Vjj) dx i d v. 
using Taylor expansions and J xjf(xj)fjj(xj)dxj =0 
yjj K ( u jj) x jf( x j)fjj( x j) dx J d Vjj 

h 8 (l + o(l)) n / yjjK(ujj) d yjj ■ J ■r J f(,r j }f' ji (,r j } ca- 
using f' ii (x)=g'J—)- 



fc 8 (i+o(i))n 

3=1 
4 



n 



C 22 {K)h*{l + o(l)) 

11; 1 °ji j=l 
(722(^)^(1 + 0(1)) 



4vr 2 



n 



i 



X<j JJ \ o 



'33 



dx,. 



where lijj = X)i=i is used to shorten some expressions, and 



C 22 (K) = J] (j VjjK ^ »„j dy^ < oo. 



Hence, similarly to (A. 3), we have 



(A.27) ^ ^ £;[a Slt a S2i a S 3 4 a S4t u Sl u S2 ii S3 M S4 ] = o(T 3 h 2 ) 



t=2 l<S4<s 3 <s 2 <si<t— 1 
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using Assumption 2.1. 

Analogously, we can deal with the other terms of (A. 25) as follows: 

(A.28) ]T £ E[a 2 Slt a 2 S2t u 2 Si u 2 S2 ] = o(T 3 h 2 ), 

t=2 I<s 2 ^si<<-1 

T 

(A.29) E E[a 2 Sit a S2t a SatU 2 Si u S2 u Sa ] = o(T 3 h 2 ), 

t=2 l<s 3 ^s 2 ^s 1 <t-l 
T 

(A.30) E E[a 3 Slt a S2t u 3 Sl u S2 ] = o(T 3 h 2 ). 

t=2 l<s 2 ^si<t-l 

Thus, we have finished the proof of (A.24) using (A.25)-(A.30). □ 
Lemma A. 5. Let the conditions of Theorem 2.2 hold. Then as T — > oo 

~2 

(A.31) - U 2 P{T) - P 0. 

(7 rp 

Proof. For 1 < S < T, recall U 2 S = p., where of = 2£f =1 Ef=i, s ^t^ * 

To use simplified notation in the proof of this lemma, we introduce the 
following lower-case notation: m = T, n = P(T), = cr-p, a 2 = cr 2 p^ T y and 
for 1 < i < n, 1 < j < i — 1, 

/t-l \ j i-l 

(A. 32) eij = (Ui-E[ul])KU^ui\uj and A mi = — ^ e ij> 

\l=j ) a ™ j=i 

(a.33) v 2 = E ^ f E A u ) = E ^ ( E «f + 4 

Note that A" mi = -^-(u 2 - E[u 2 ])vf with E[X mi \ = 0. 
Observe that 

~ 2 ~2 m n / 1 m 1 n \ 

^ - £ ^-£«? - ^E-l 

"i ™ i=l j=l \ "i i=l n i=l / 

(A ' 34) 

— -^rnn ~H E\u^\J mn , 

where 7 mn = E^i X mi - £™ =1 X nj and J mn = YT=i ~ £ E"=i 4 

In view of (A.33), in order to prove (A.31), it suffices to show that as 
m, n — > oo 

(A.35) 7 mn and J mn -> P 0. 
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We now prove the first part of (A. 35). In view of the fact that the inde- 
pendence of {u{} implies for n + 1 < i < m and 1 < j ; < n, 



E[X m i{X m j — X n j)] 



1 SE*-B[^])] 



ff 4 ff 2 



X E 



E[4\)K 2 h [J2 u p) u M E% K 



\p=k 



0. 



we have 

E^mn] = E 



(A.36) 



E Xmi E ^raj 



2 



1 



i=n+l 



-i=n+l j'=l 
: 2 

■3=1 



-4 E E(u}-E\u*]fE[vt\ + 



i=n+l 



[e 2 m - °IY 
a 4 a 4 



xY^E(n 2 -E[ul}) 2 E[v^}. 

3=1 



We start by looking at Y^iLn+i E[vf] and Y^j=i E[vj] in order to complete 
the proof of the first part of (A. 35). Before we compute the two terms, we 
have a look at how to prove the second part of (A. 35). Note that 



(A.37) 



E[Jmn\ - E 



E 



m i=1 



n j=1 



E ^ 2 + ^^E 



2 n 



i=n+l 



\e 



E 



a 2 a 2 



, (°-l-° 2 m) 2 E 



i=n+l 

2 m n 



a 4 a 4 

y my n 



E^ 

■3=1 



2 

, r, a n ~ a m \ ~* \ " j^r 2 2i 

+ 2 —4^ E E* "ji- 



ff 4 o- 2 

m n i=n+lj=l 
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We first deal with (A. 37) term by term. Recalling aji = Kh(YX=j u ')> we 
have 



/ m \ 2 

4 E «? =e 

\i=n+l / 



m m 



2 2 

V; V 



E E "t"3 
.i=n+l j=n+l 

(A.38) 

m m m 

= E E ^t\ + E E « 

i=n+l i=n+l j=n+l j't^i 

Observe that 

;-i j-i 

i%\ 2 ] = EE^ a ^H^] 

c=ld=l 

(a.39) =EE £7 [°^c4 u S] +EE^[«^c«|^] 

c=l d=l c=j d=l 



where 



^ = E E ^^c^c !^] 

c=l d=l 

(A.40) = E E[a 2 ci a 2 cj ut] + 2 E E ^[4«c4 u S 

c=l c=2d=l 

= Jy(l)+^-(2), 

i-lj'-l 

(A.4i) jy = e E ^[4«c4 u 3] = E E £[4«c»M 

c=j d=l c=j d=l 

using the fact that {u^ :j<k<i — l} and {ui : 1 < I < J — 1} are all mutually 
independent. 

Thus, we need only to evaluate ^"=2^=1 ^° do so ' we introduce an- 
other set of simplified symbols: Z X \ = Efc=d+i^fc, ^22 = Efc=c+i n fc' z 33 = 
J2]=j u u %i = u d an d Z2 = u c . In this case, we have the following decompo- 
sitions: for 1 < d < c— 1, 1 < d < j — 1 and 1 < j < i — 1, 

E^= u c+ E +E n « = ^ + -Z"22 + ^33, 
Z=c Z=c+1 l=j 

j-l c-1 J-l 

E«fc=«d+ E U k+U c + E U ^ = Z l + ^2 + #11 + ^22- 
k=d k=d+l k=c+l 
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By the same arguments as used in the proof of Lemma A.l, we have 
E[a 2 ci u 2 c a 2 CJ u 2 c ] = E[K 2 {Z 2 + Z 22 )K 2 (Z 2 + Z 22 + Z 33 )Z A \ 
= / J K\(x 2 + x 22 )Kl(x 2 +x 22 + x 33 ) 

X xjf(x 2 )f 22 (x 22 )f 33 (x 33 ) dx 2 dx 22 dx 33 

X 2 + X 22 X 33 

using y 2 = x 2 ,y 22 = , y 33 = — 



h 2 



■ J K 2 {y 22 )K 2 (y 22 + y 33 )y A 2 

x / (2/2)/22(j/2 - 2/22^/33(^2/33) dy 2 dy 22 dy 33 
h 2 (l + o(l))( f K 2 {u)du) 



x ^ xif(x 2 )f 22 (x 2 )dx 2 ^jf 33 (0), 

where /&(■) denotes the marginal density of Za and /(•) denotes the density 
of Zi. 

Similarly, we have 



E [ali u l a % u d} = E 



K 2 (j2(Zi + Zu)\ K 2 (Z 2 + Z 22 + Z 33 )Z 2 Z 
I I K ^ i^-^ Xi + Xii ^j K ^°° 2 + X22 + X33 - ) 



2 



x 



\ x 2 f(xi)fii(xii) dxi dxa f 33 (x 33 ) dx 33 



=1 



/i 3 (l+o(l)) J - J K 2 ( yil +y 22 )K 2 (y 22 + y 33 )y 2 y 2 
x /(2/i)/(2/2)/n(2/l - 2/n^) 

X /22(2/2 - 2/22/l)/33(0) 



x dyi <% 2 ^2/11 ^2/22 %33 
2 



/i 3 (l + o(l))(y K 2 (u)d^ / 33 (0) 
x (J x\f{xi)fii(xi)dx^j(^J x\f{x 2 )f 22 {x 2 )dx^ 



32 GAO, KING, LU AND TJ0STHEIM 

Using the same arguments as used in the calculations of (A. 2), (A. 3) and 
(A. 7), we have 

m i— 1 

32 E £j y (l) = C? (m-n) 3 fc 2 (l + O (l)) 



i=n+l 1=1 

(A.42) 



o*_ n (l + o(l)), 



m i—1 



(A.43) 



32 E J ^( 2 ) = C{m-n) 2 h 3 {l + o(l)) 
i=n+l j=l 

o((T m _ n ), 

where C\q is as defined in Theorem 2.1 and C > is a positive constant. 
Similarly, by (A. 41) we have 

771 % — 1 777 % — 1 % — 1 j — 1 

32 E E J ^( 2 ) = 32 E EEE^ C 2 ]^!^] 

i=n+l j=l i=n+l j=l c=j d=l 

= 0((T m _ n J. 

Hence, (A.39)-(A.44) imply for m and n large enough, 
E 



E E 



E E 

i=n+l j=n+l 



2 2 

,i=n+l j=n+l,jyt=i 

(A.45) 

= ^_ n (l + o(l)), 

where cr^ is as defined above (A. 32). 

Analogously to (A. 5), we can show that for m and n large enough, 



m i—1 i—1 

2 2i 



E ^K 4 ]= E EE £[44* 

i=n+l i=n+l s=l t=l 

m i—1 s— 1 



(A.46) =0(^) U YJU ' ' 

i=n+ls=2t=l V * 6 V * L 

= 0(h 2 (m 2 -n 2 )) = o(a 4 m _ n ). 
Similarly to (A.45), we can show that for n large enough, 

n n i—1 i—1 

i=l i=ls=lt=l 

n i—1 s—1 I 

= 0(fcV) = o(^). 
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Analogously to (A. 45), we also have for m and n large enough, 



(A.48) 



i=n+lj'=l i=n+lj=l 



= cr m-n^n( 1 + °( 1 ))- 

Therefore, (A.37)-(A.48) imply that as m,re^ oo 



2 1 

inn J 



1 



or 



m 

E^ 

i=l 
m 



1 



E^ 
3=1 



i=n+\ 
m 

E -I 

-i=n+l 



rr 2 - rr 2 n 

^2 a 2 2^ w j 

"m^n j—i 

H — ^ 

rT (T 



Lj=l 



2 2 
r) ®n 

u m u n i=n+lj=1 



(m — n)" 



+ 



e 

~ij=i 

m 3/2 _ n 3/2^j2 



( m 3 / 2 — n 3 / 2 )(m — n) 3//2 



x (l + o(l)) 

_> (1 _ r )3 + (1 _ r 3/2)2 _ 2(1 _ r 3/2 )(1 _ r) 3/2 
= ((l-r) 3 / 2 -(l-r 3 / 2 )) 2 >0 
using a 2 m = ^M m 3 / 2 h, a 2 n = fj^n^h and r = lim m ,™ £. 



Since r = from the construction in the beginning of the proof of Lemma 
A. 3 above, we have therefore shown the second part of (A. 35). We now turn 
to the first part of (A. 35). Using the results that J2l"=n+i E[ v f] = °(. a m-n) 
and Y^j=i^[ v j] = °( cr n)i the proof of the first part of (A. 35) follows from 
(A. 36). We therefore have completed the proof of Lemma A. 5. □ 



Define a random variable N(T) in the same way as T{n) that is defined 
in Karlsen and Tj0stheim (2001) [see Appendix B of Gao et al. (2008) for 
more details]. Recall 



(A.49) 



C 10 = l^ and a 2 T = C 10 T^ 2 h. 

3v 2-7T 
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Lemma A. 6. Let the conditions of Theorem 2.2 hold. Then as T — > oo 



U 



(a.50) 4- >d e 



a 



with £ 2 = ^-Mij2{i), where M 1 / 2 ( - ) is a special case of the Mittag-Leffer 
process Mp(-) for (3 = \ as described by Karlsen and Tj0stheim (2001), 
page 388. 

Proof. Observe that 



^ 2 
<7<p 



2E( E «)«* a =2Efi;a2 t «2k 2 -2f;4«f. 

t=l \s=l,s^ / i=l \s=l i=l 



Similarly to computations made between (A. 5) and (A. 6), it can be shown 
that 



E( E 4(«i-4 J 

t=l Vs=l,s^i / 



2 

.4 • 



o{a T ) 



(A.51) £ 
using E[ui\ = 1. 

Let Q(u) = Then Q(-) is a probability kernel. Applying Lemma C.l 

in Appendix C of Gao et al. (2008), we may show that as T — > oo 



E AT(T\h E^l 



T ( 1 T 



(A-52) ^E v^Eof^^^VlK 



1 T 



where we have used the result that tt s (Q) = J Q(u) du = l [see the discussion 
at the end of Appendix B of Gao et al. (2008)]. 

Meanwhile, Theorem 3.2 of Karlsen and Tj0stheim (2001), page 389, is 
applicable to the current case of X t = X t ~\ + u t under Hq to show that as 
T — ► oo 

(A.53) _^)^ Ml/2 (l), 

when the slowly varying function L s (T) in this case is L s (T) = Lq = . 
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Thus, along with a strengthened version of Theorem 5.1 of Karlsen and 
Tj0stheim (2001), (A.51)-(A.53) imply as T^oo 

T / T \ 

', a 'st u 's I «t 



2 



cr, 



(A.54) 



T t= i \ s= l 

T / T 



3 

CiaTzh t =i \ s =i / 
2L iV(r) 1 t 2 2 _ 



2L J 02 A(T) 1 1 * /Xs-i-Xt-! 

C 10 L VTT£l\N(T)h^ 



-DyM 1/2 (l)E^ 

where we have used the facts that {u s } is a sequence of i.i.d. random errors 
with E[u\] = and E[u\] = 1 and that {a 2 st u 2 s : 1 < s < t — 1} is independent 
of ut- Therefore, (A.51)-(A.54) complete the proof of Lemma A. 6. □ 
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